24 research outputs found
Robust Regression via Hard Thresholding
We study the problem of Robust Least Squares Regression (RLSR) where several
response variables can be adversarially corrupted. More specifically, for a
data matrix X \in R^{p x n} and an underlying model w*, the response vector is
generated as y = X'w* + b where b \in R^n is the corruption vector supported
over at most C.n coordinates. Existing exact recovery results for RLSR focus
solely on L1-penalty based convex formulations and impose relatively strict
model assumptions such as requiring the corruptions b to be selected
independently of X.
In this work, we study a simple hard-thresholding algorithm called TORRENT
which, under mild conditions on X, can recover w* exactly even if b corrupts
the response variables in an adversarial manner, i.e. both the support and
entries of b are selected adversarially after observing X and w*. Our results
hold under deterministic assumptions which are satisfied if X is sampled from
any sub-Gaussian distribution. Finally unlike existing results that apply only
to a fixed w*, generated independently of X, our results are universal and hold
for any w* \in R^p.
Next, we propose gradient descent-based extensions of TORRENT that can scale
efficiently to large scale problems, such as high dimensional sparse recovery
and prove similar recovery guarantees for these extensions. Empirically we find
TORRENT, and more so its extensions, offering significantly faster recovery
than the state-of-the-art L1 solvers. For instance, even on moderate-sized
datasets (with p = 50K) with around 40% corrupted responses, a variant of our
proposed method called TORRENT-HYB is more than 20x faster than the best L1
solver.Comment: 24 pages, 3 figure
Locally Non-linear Embeddings for Extreme Multi-label Learning
The objective in extreme multi-label learning is to train a classifier that
can automatically tag a novel data point with the most relevant subset of
labels from an extremely large label set. Embedding based approaches make
training and prediction tractable by assuming that the training label matrix is
low-rank and hence the effective number of labels can be reduced by projecting
the high dimensional label vectors onto a low dimensional linear subspace.
Still, leading embedding approaches have been unable to deliver high prediction
accuracies or scale to large problems as the low rank assumption is violated in
most real world applications.
This paper develops the X-One classifier to address both limitations. The
main technical contribution in X-One is a formulation for learning a small
ensemble of local distance preserving embeddings which can accurately predict
infrequently occurring (tail) labels. This allows X-One to break free of the
traditional low-rank assumption and boost classification accuracy by learning
embeddings which preserve pairwise distances between only the nearest label
vectors.
We conducted extensive experiments on several real-world as well as benchmark
data sets and compared our method against state-of-the-art methods for extreme
multi-label classification. Experiments reveal that X-One can make
significantly more accurate predictions then the state-of-the-art methods
including both embeddings (by as much as 35%) as well as trees (by as much as
6%). X-One can also scale efficiently to data sets with a million labels which
are beyond the pale of leading embedding methods
On the Sensitivity of Reward Inference to Misspecified Human Models
Inferring reward functions from human behavior is at the center of value
alignment - aligning AI objectives with what we, humans, actually want. But
doing so relies on models of how humans behave given their objectives. After
decades of research in cognitive science, neuroscience, and behavioral
economics, obtaining accurate human models remains an open research topic. This
begs the question: how accurate do these models need to be in order for the
reward inference to be accurate? On the one hand, if small errors in the model
can lead to catastrophic error in inference, the entire framework of reward
learning seems ill-fated, as we will never have perfect models of human
behavior. On the other hand, if as our models improve, we can have a guarantee
that reward accuracy also improves, this would show the benefit of more work on
the modeling side. We study this question both theoretically and empirically.
We do show that it is unfortunately possible to construct small adversarial
biases in behavior that lead to arbitrarily large errors in the inferred
reward. However, and arguably more importantly, we are also able to identify
reasonable assumptions under which the reward inference error can be bounded
linearly in the error in the human model. Finally, we verify our theoretical
insights in discrete and continuous control tasks with simulated and human
data.Comment: 17 pages, 12 figure
Congested Bandits: Optimal Routing via Short-term Resets
For traffic routing platforms, the choice of which route to recommend to a
user depends on the congestion on these routes -- indeed, an individual's
utility depends on the number of people using the recommended route at that
instance. Motivated by this, we introduce the problem of Congested Bandits
where each arm's reward is allowed to depend on the number of times it was
played in the past timesteps. This dependence on past history of
actions leads to a dynamical system where an algorithm's present choices also
affect its future pay-offs, and requires an algorithm to plan for this. We
study the congestion aware formulation in the multi-armed bandit (MAB) setup
and in the contextual bandit setup with linear rewards. For the multi-armed
setup, we propose a UCB style algorithm and show that its policy regret scales
as . For the linear contextual bandit setup, our
algorithm, based on an iterative least squares planner, achieves policy regret
. From an experimental standpoint, we
corroborate the no-regret properties of our algorithms via a simulation study.Comment: Published at ICML 202